3 research outputs found
Decoupled Training for Long-Tailed Classification With Stochastic Representations
Decoupling representation learning and classifier learning has been shown to
be effective in classification with long-tailed data. There are two main
ingredients in constructing a decoupled learning scheme; 1) how to train the
feature extractor for representation learning so that it provides generalizable
representations and 2) how to re-train the classifier that constructs proper
decision boundaries by handling class imbalances in long-tailed data. In this
work, we first apply Stochastic Weight Averaging (SWA), an optimization
technique for improving the generalization of deep neural networks, to obtain
better generalizing feature extractors for long-tailed classification. We then
propose a novel classifier re-training algorithm based on stochastic
representation obtained from the SWA-Gaussian, a Gaussian perturbed SWA, and a
self-distillation strategy that can harness the diverse stochastic
representations based on uncertainty estimates to build more robust
classifiers. Extensive experiments on CIFAR10/100-LT, ImageNet-LT, and
iNaturalist-2018 benchmarks show that our proposed method improves upon
previous methods both in terms of prediction accuracy and uncertainty
estimation.Comment: ICLR 202
SWAMP: Sparse Weight Averaging with Multiple Particles for Iterative Magnitude Pruning
Given the ever-increasing size of modern neural networks, the significance of
sparse architectures has surged due to their accelerated inference speeds and
minimal memory demands. When it comes to global pruning techniques, Iterative
Magnitude Pruning (IMP) still stands as a state-of-the-art algorithm despite
its simple nature, particularly in extremely sparse regimes. In light of the
recent finding that the two successive matching IMP solutions are linearly
connected without a loss barrier, we propose Sparse Weight Averaging with
Multiple Particles (SWAMP), a straightforward modification of IMP that achieves
performance comparable to an ensemble of two IMP solutions. For every
iteration, we concurrently train multiple sparse models, referred to as
particles, using different batch orders yet the same matching ticket, and then
weight average such models to produce a single mask. We demonstrate that our
method consistently outperforms existing baselines across different sparsities
through extensive experiments on various data and neural network structures
Martingale Posterior Neural Processes
A Neural Process (NP) estimates a stochastic process implicitly defined with
neural networks given a stream of data, rather than pre-specifying priors
already known, such as Gaussian processes. An ideal NP would learn everything
from data without any inductive biases, but in practice, we often restrict the
class of stochastic processes for the ease of estimation. One such restriction
is the use of a finite-dimensional latent variable accounting for the
uncertainty in the functions drawn from NPs. Some recent works show that this
can be improved with more "data-driven" source of uncertainty such as
bootstrapping. In this work, we take a different approach based on the
martingale posterior, a recently developed alternative to Bayesian inference.
For the martingale posterior, instead of specifying prior-likelihood pairs, a
predictive distribution for future data is specified. Under specific conditions
on the predictive distribution, it can be shown that the uncertainty in the
generated future data actually corresponds to the uncertainty of the implicitly
defined Bayesian posteriors. Based on this result, instead of assuming any form
of the latent variables, we equip a NP with a predictive distribution
implicitly defined with neural networks and use the corresponding martingale
posteriors as the source of uncertainty. The resulting model, which we name as
Martingale Posterior Neural Process (MPNP), is demonstrated to outperform
baselines on various tasks.Comment: ICLR 202